In view of the existing research about document similarity calculation methods based on siamese networks, the entire document is regarded as the input sequence of model that may lead to sparse data. Hierarchical attention mechanism is used to improve the document representation in the siamese network. For the siamese network computing model based on hierarchical attention mechanism may ignore the important sentence in the document when inputting, a two-step document similarity calculation method that introduces the compression of document content is further proposed. The experimental results show that the proposed method is obviously superior to the siamese network computing model based on the Long Short-Term Memory.
To slove the classification of the “de” structure containing the usage of semantic ellipsis, a hybrid neural network is built. Firstly, the network uses a bidirectional LSTM (long short-term memory) neural network to learn more syntactic and semantic information of the “de” structure. Then, the network employs a Max-pooling layer or GRU (gated recurrent unit) based multiple attention layers to capture features of ellipsis of the “de” structure by which the network can recognize the “de” structure containing the usage of semantic ellipsis. Experiments on CTB8.0 corpus show that the proposed approach can achieve accurate results efficiently, the F1 value is 96.67%.
Based on the efficiency and effectiveness issue of traditional simiar spatial textual objects retrieval, a semantic aware strategy which can effectively and efficiently retrieve the top-k similar spatial textal objects is proposed. The efficient retrieval strategy which is based on spatial textual objects is built on a common framework of spatial object retrieval, and it can satisfy the efficiency and effectiveness issues of users. Extensive experimental evaluation demonstrates that the performance of the proposed method outperforms the state-of-the-art approach.
Based on the research issue of sense guessing of Chinese unknown words, different levels of semantic dictionary were introduced by applying “Semantic Knowledge-base of Modern Chinese”. Models have constructed for sense guessing by using these dictionary. Each model was intergrated to predict the unknown words and obtained better performance. Based on each model, semantic prediction and annotation of the unknown words in People’s Daily which published in 2000 were evaluated. Finally, corpus resources with the sense annotation of unknown words were obtained.